Modulos EDA¶

In [1]:
import pandas as pd
In [3]:
df = pd.read_csv('https://raw.githubusercontent.com/rromanss23/Machine_Leaning_Engineer_Udacity_NanoDegree/master/projects/boston_housing/housing.csv')

I - Pandas Profiling¶

In [8]:
# !pip install pandas_profiling
In [9]:
# Importamos el módulo
from pandas_profiling import ProfileReport
In [23]:
# Generamos el reporte
profile = ProfileReport(df, title='Boston house pricing')
In [11]:
#Mostramos el reporte
# profile.to_widgets()
profile.to_file("Pandas_Profile.html")
Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

II - Sweetviz¶

In [13]:
# !pip install sweetviz
In [14]:
# Importamos el módulo
import sweetviz as sv
In [15]:
# Generamos el reporte
my_report = sv.analyze(df)
                                             |      | [  0%]   00:00 -> (? left)
In [16]:
# El reporte se puede exportar a HTML o previsualizarlo en el notebook:

my_report.show_html()                # Exporta a HTML
# my_report.show_notebook()              # Previasualiza en el notebook
Report SWEETVIZ_REPORT.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.

IV - DataPrep¶

In [20]:
#! pip install dataprep
In [21]:
from dataprep.eda import create_report
In [26]:
report = create_report(df, title = 'Boston house pricing')
  0%|                                                   | 0/799 [00:00<?, ?it/s]
/home/mato/jupyter/jupyterenv/lib/python3.10/site-packages/dask/core.py:119: RuntimeWarning: invalid value encountered in divide
  return func(*(_execute_task(a, cache) for a in args))
In [28]:
report.save('dataprep_report')
Report has been saved to dataprep_report.html!

V - Autoviz¶

In [35]:
#!pip install autoviz
In [37]:
from autoviz.AutoViz_Class import AutoViz_Class
AV = AutoViz_Class()
df = AV.AutoViz('https://raw.githubusercontent.com/rromanss23/Machine_Leaning_Engineer_Udacity_NanoDegree/master/projects/boston_housing/housing.csv')
Shape of your Data Set loaded: (489, 4)
#######################################################################################
######################## C L A S S I F Y I N G  V A R I A B L E S  ####################
#######################################################################################
Classifying variables in data set...
Data cleaning improvement suggestions. Complete them before proceeding to ML modeling.
  Nuniques dtype Nulls Nullpercent NuniquePercent Value counts Min Data cleaning improvement suggestions
LSTAT 442 float64 0 0.000000 90.388548 0
RM 430 float64 0 0.000000 87.934560 0
MEDV 228 float64 0 0.000000 46.625767 0
PTRATIO 44 float64 0 0.000000 8.997955 0
    4 Predictors classified...
        No variables removed since no ID or low-information variables found in data set
Number of All Scatter Plots = 10
No categorical or numeric vars in data set. Hence no bar charts.
All Plots done
Time to run AutoViz = 2 seconds 

 ###################### AUTO VISUALIZATION Completed ########################
In [42]:
#!pip install jupyter_contrib_nbextensions
In [52]:
#!pip install -U nbconvert==5.6.1
In [ ]: